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Doubly stochastic Poisson processes, also known as the Cox pro- 
cesses, frequently occur in various scientific fields. In this article, mo- 
tivated primarily by analyzing Cox process data in biophysics, we 
propose a nonparametric kernel-based inference method. We conduct 
a detailed study, including an asymptotic analysis, of the proposed 
method, and provide guidelines for its practical use, introducing a 
fast and stable regression method for bandwidth selection. We apply 
our method to real photon arrival data from recent single-molecule 
biophysical experiments, investigating proteins' conformational dy- 
namics. Our result shows that conformational fluctuation is widely 
present in protein systems, and that the fluctuation covers a broad 
range of time scales, highlighting the dynamic and complex nature of 
proteins' structure. 

1. Introduction. Poisson processes, fundamental to statistics and prob- 
ability, have wide ranging applications in sciences and engineering. A special 
class of Poisson processes that researchers across different fields frequently 
encounter is the doubly stochastic Poisson process. Compared to the stan- 
dard Poisson process, a key feature of a doubly stochastic one is that its 
arrival rate is also stochastic. In other words, if we let N{t) denote the 
process and let A(t) denote the arrival rate, then, conditioning on A(t), 

N{t)\X{t) ~ inhomogeneous Poisson process with rate A(t), 

where A(t) itself is a stochastic process [Cox and Isham (1980); Daley and 
Vere-Jones (1988); Karr (1991); Karlin and Taylor (1981)]. In the litera- 
ture such processes are also referred to as Cox processes in honor of their 
discoverer [Cox (1955a, 1955b)]. 
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We consider the inference of Cox processes with large arrival rates in 
this article. Our study is primarily motivated by the frequent occurrences 
of Cox process data in biophysics and physical chemistry. In these fields, 
experimentalists commonly use fluorescence techniques to probe a biologi- 
cal system of interest [Krichevsky and Bonnet (2002)], where the system is 
placed under a laser beam, and the laser excites the system to emit pho- 
tons. The experimental data consist of photon arrival times with the arrival 
rate depending on the stochastic dynamics of the system under study (for 
example, the active and inactive states of an enzyme can have different pho- 
ton emission intensities). By analyzing the photon arrival data, one aims to 
learn the system's biological properties, such as conformational dynamics 
and reaction rates. 

Although we mainly focus on biophysical applications, we note that Cox 
processes also appear in other fields. In neuroscience Cox process data arise 
in the form of neural spike trains — a chain of action potentials emitted by 
a single neuron over a period of time [Gerstner and Kistler (2002)] — from 
which researchers seek to understand what information is conveyed in such a 
pattern of pulses, what code is used by neurons to transmit information, and 
how other neurons decode the signals, etc. [Bialek et al. (1991); Barbieri et al. 
(2005); Rieke et al. (1996)]. Astrophysics is another area where Cox process 
data often occur. For example, gamma-ray burst signals, pulsar arrival times 
and arrivals of high-energy photons [Meegan et al. (1992); Scargle (1998)] are 
studied to gain information about the position and motion of stars relative 
to the background [Carroll and Ostlie (2007)]. 

Previous statistical studies of Cox process data in the biophysics and 
chemistry literature mainly focus on constructing/analyzing parametric mod- 
els. For instance, continuous-time Markov chains and stationary Gaussian 
processes have been used to model the arrival rate \{t) for enzymatic reac- 
tions [English et al. (2006); Kou et al. (2005b); Kou (2008b)], DNA dynamics 
[Kou, Xie and Liu (2005a)], and proteins' conformational fluctuation [Min 
et al. (2005b); Kou and Xie (2004); Kou (2008a)]. 

Although effective for studying the stochastic dynamics of interest when 
they are correctly specified, parametric models are not always applicable for 
data analysis, especially when researchers (i) are in the early exploration 
of a new phenomenon, or (ii) are uncertain about the correctness of exist- 
ing models, and try to avoid drawing erroneous conclusions from misspeci- 
fied parametric models. Owing to its fiexibility and the intuitive appeal of 
"learning directly" from data, we focus on the nonparametric inference of 
Cox process data in this paper. In particular, we develop kernel based esti- 
mators for the arrival rate X{t) and its autocorrelation function (ACF). The 
ACF is of interest because it directly measures the strength of dependence 
and reveals the internal structure of the system. For example, for biophys- 
ical data, a fast decay of the ACF, such as an exponential decay, indicates 
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that the underlying biological process is Markovian and that the biomolecule 
under study has a relatively simple conformation dynamic, whereas a slow 
decay of ACF, such as a power-law decay, signifies a complicated process 
and points to an intricate internal structure/conformational dynamic of the 
biomolecule. Thus, in addition to discovering important characteristics of 
the stochastic dynamics under study, the autocorrelation function can also 
be used to test the validity of parametric models. 

Kernel smoothing and density estimates have been extensively developed 
in the last three decades; see, for example, Silverman (1986), Eubank (1988), 
Miiller (1998), Hardle (1990), Scott (1992), Wahba (1990), Wand and Jones 
(1994), Fan and Gijbels (1996) and Bowman and Azzalini (1997). Mean- 
while, kernel estimators of spatial point processes motivated by applications 
in epidemiology, ecology and environment studies have been proposed; see 
Diggle (1985, 2003), Stoyan and Stoyan (1994), Moller and Waagepetersen 
(2003), Guan, Sherman and Calvin (2004, 2006) and Guan (2007). Com- 
pared to these spatial applications, the Cox processes that we encounter in 
biophysics have some unique features: (i) the arrival rates are usually large 
because strong light sources, such as laser, are often used; (ii) the data size 
tends to be large, since one can often control the experimental duration; 
(iii) both short-range and long-range dependent processes can govern the 
underlying arrival rate. Consequently, the estimators designed for spatial 
point processes are not always applicable to biophysical data. For example, 
the asymptotic variance formulas derived in the spatial context do not work 
for high intensity photon arrival data. The general cross-validation method 
for bandwidth selection [Guan (2007)], due to its intense computation, does 
not work well either for large photon arrival data. Furthermore, because 
of the large arrival rate, the statistical performance of the kernel estimate 
depends not only on the Poisson variation of N{t) given X(t) but, more im- 
portantly, on the stochastic properties of \{t). For instance, we shall see in 
Section 4 that the kernel estimate of ACF will have asymptotically normal 
distribution only if A(t) has short-range dependence. 

Similar to classical kernel estimation, there is a bandwidth selection prob- 
lem associated with kernel inference of Cox process data. Using the mean in- 
tegrated square error (MISE) criterion [Marron and Tsybakov (1995); Jones, 
Marron and Sheather (1996); Grund, Hall and Marron (1994); Marron and 
Wand (1992); Park and Turlach (1992); Diggle (2003)], we propose a stable 
and fast regression plug-in method to choose the bandwidth. 

As our study is motivated by the analysis of scientific data, we apply our 
method to photon arrival data from real biophysical experiments. The result 
from our nonparametric inference helps elucidate the stochastic dynamics of 
proteins. In particular, our results show that as proteins (such as enzymes) 
spontaneously change their three-dimensional conformation, the conforma- 
tional fluctuation covers a very broad range of time scales, highlighting the 
complexity of proteins' conformational dynamics. 
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The rest of the paper is organized as follows. Section 2 considers kernel 
estimation of the arrival rate A(i). Section 3 focuses on estimating the ACF 
of A(t), and provides some guidelines for practical estimation. Section 4 
investigates the asymptotic distribution of our kernel estimates, laying down 
the results for confidence interval construction. In Section 5 we apply our 
method to simulated data and photon arrival data from two biophysical 
experiments. We conclude in Section 6 with some discussion and future work. 
The technical proofs are provided in the supplementary material [Zhang and 
Kou (2010)]. 

2. Kernel estimation of the arrival rate. 

2.1. The estimator. Suppose within a time window [0,T] a sequence of 
arrival times si,S2, . . . ,sk has been observed from a Cox process N{t), which 
has stochastic arrival rates A(t). The goal is to infer from the arrival times 
the stochastic properties of X{t). To do so, we assume the following: 

Assumption 1. The arrival rate A(t) is a stationary and ergodic process 
with finite fourth moments. 

Stationarity (i.e., the distribution of {X{t),t G R} is time-shift invariant) 
and ergodicity (i.e., essentially ^ X{s) ds ^ E[X{0)], as T —)■ oo) are both 
natural and necessary for making nonparametric inference of X{t) from a 
single sequence of arrival data. Assumption 1 is particularly relevant for 
single-molecule biophysical experiments [Kou (2009)] in which the system 
under study is typically in equilibrium or steady state. 

With Assumption 1, we now construct a kernel based arrival rate estima- 
tor 

K 1 /I \ 

(2.1) Xh{t)=Y,fhit,Si), with hit, s) = -f(-{s-t) I 
i=i ^ ^ 

where / is a symmetric density function, and h is the bandwidth. When / 
is taken to be the uniform kernel, Xhit) amounts to the binning-counting 
method used in the biophysics literature [Yang and Xie (2002a, 2002b)], 
in which A(t) is estimated by the number of data points falling into the 
bin containing t divided by the bin width [see also Diggle (1985); Berman 
and Diggle (1989)]. One undesirable consequence of uniform kernel is that, 
as points move in and out of the bins, Xfi{t) is artificially discontinuous. 
We thus consider general /, and without loss of generality, we assume the 
following: 

Assumption 2. / is a density function symmetric around with bounded 
support [—6,6]. 
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The assumption of bounded support in fact can be relaxed — essentially 
all the results in this paper can be extended to kernels with unbounded 
supports. However, to make the theory more presentable and to reduce the 
length of algebra, we will work with Assumption 2. 

When t is getting too close to the boundaries of the observational time 
window [0,T], there are apparently not enough data to estimate X{t) accu- 
rately. One method is to use end correction [see Diggle (1985); Berman and 

Diggle (1989)]: A^(t) = Y.i=ifhit,Si)/ Jq fh{t,s)ds, which is identical to 
(2.1) if t G [bh,T — bh]. However, the variance of the end-corrected-estimate 
X^{t) tends to be large when t is close to or T. We, instead, estimate X{t) 
by 

f K 

Y,fh{t,Si), ifbh<t<T-bh, 

i=l 

Xhibh), ifO<t<bh, 
[.Xh{T-bh), ifT -bh<t<T, 



(2.2) Xh{t) = { 



that is, we use Xh(bh) and Xh{T — bh) to approximate X{t) near the bound- 
aries. We shall see shortly (Table 1) that for typical biophysical data the 
bandwidth h is quite small; thus, the bias of (2.2) is also small. 

Since the choice of the bandwidth h affects the performance of the kernel 
estimate, we next determine the optimal h that gives the smallest mean 
integrated square error (MISE) 

MISEf{h) = E(^^ j\xhit)-X{t)fdt 

Owing to the stationarity and ergodicity of X{t), we have 
MISEfih) = EiXhito) - X{to)f + 0{h/T), 

where to is any number within [bh,T — bh], say, to = T/2, and the 0{h/T) 
term arises from the boundary of [0, T]. Hence, minimizing the MISE amounts 
to minimizing the MSE of Xh{to). 

Let C{t) denote the ACF of the arrival rate X{t): C{t) = cov(A(0), A(t)). 
To find the optimal bandwidth that minimizes the MSE of A/i(to), we make 
one more assumption. 

Assumption 3. The ACF C{t) is twice continuously differentiable for 
t > 0, and has nonzero right derivative at 0, that is, C"(0"'") = \im.g^Q+{C{s) — 
C(0))/s exits and is nonzero. 



This assumption reflects the fact that the arrival rate process A(t) in 
real experiments is usually not differentiable [Parzen (1962), Chapter 3]; for 
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example, X{t) could be a finite-state continuous Markov chain, whose path 
consists of piecewise jumps and whose ACF is a mixture of exponential 
functions, which are nondifferentiable at zero, or X{t) could be a functional 
of a stationary nondifferentiable Gaussian process, such as the Ornstein- 
Uhlenbeck process (representing a harmonic oscillator). 

Theorem 2.1. Under Assumptions 1-3, the MSE of Xh{to) is given by 

(2.3) ECXhito)-X{to)f = ^EiXm j'j\r)dr + hC'{Q^)^f + R2{h), 

where the constant 

/b f'h pb 
I f{ri)f{r2)\ri-r2\dridr2-2 f{r)\r\dr<0, 
-bJ-b J-b 



and 



R2{h)= t t f{n)f{r2) '^'^''{\n-r2\h-s)C"{s)dsdndr2 



b 



f{r) / {\r\h- s)C"{s)dsdr = o{h). 







The optimal h that minimizes the sum of the first two terms (i.e., the main 
terms) of the right-hand side of (2.3) is given by 



(2.5) h, 



opt 



C"(0+)7/i_i 



The constant jf is strictly negative as long as / is a density function. 
R2{h) is the remainder term. For data with large arrival rates, /lopt is small, 
and R2{hopt) contributes little to the MSE. The proof of the theorem is 
given in the supplementary material [Zhang and Kou (2010)]. 

Since /lopt involves unknown quantities, for real applications we use a 
regression based plug-in method to estimate it. First, ^ = E{X{Q)) is unbi- 
asedly estimated by /i = KjT ^ the total number of arrivals divided by the 
time window length, because 

i?(A) = \E{E\k\Xm = ^El^j\{t)dt^=E{Xm. 

Next, we use a regression method to estimate C"(0"'"). We will discuss this re- 
gression estimate in detail in Section 3.2 when we study the ACF estimation. 
Plugging fi and C"(0"'") into (2.5) yields our estimate /lopt- 
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2.2. Numerical illustration. We use two simulation examples to illustrate 
our method. In the first example, the arrival rate A(i) of the Cox process 
follows a continuous-time two-state Markov chain, which can be depicted as 

ki 

(2.6) A^B, 

k2 

where ki and k2 represent the transition rates between the two states A 
and B. This model has been used in the chemistry and biophysics literature 
[Reilly and Skinner (1994)] to model spectral and fluorescence data from 
two-level systems, such as the open-close of a DNA hairpin [Kou, Xie and 
Liu (2005a)], and the on-off of ion channels [Hawkes (2005); Sakmann and 
Neher (1995)]. We set the transition rates ki = 2, k2 = 5 and arrival rates 
= 1000 and = 400 respectively at states A and B in the simulation; 
the observational time T = 500. These numbers are taken to mimic a typical 
photon arrival experiment in biophysics. We generate a realization of A(t) 
from the two-state model and then the arrival times Sj, i = 1, . . . ,K on top 
of it. The true mean arrival rate fi = E{X{0)) = (/c2Aa + ki\B)/{ki + ^2) is 
equal to 828.57 in this case. The simulated data has the empirical mean 
arrival rate ji = K/T = 823.11. 

Figure 1(a) shows the estimate A^ ^(t), compared with the true \{t), 

based on the Epanechnikov kernel f{t) = |(1 — f^)/(|i| < 1). Figure 1(a) 
represents a typical result. We see that A(i) is well recovered. Table 1 
summarizes the results based on 100 independent simulations for applying 
our method with four different kernels: the uniform, Epanechnikov, trian- 
gular f{t) = (1 - \t\)I{\t\ < 1) and quartic f{t) = i|(l - t^)^I{\t\ < 1) ker- 
nels. The second and third columns present the optimal bandwidth /lopt for 
each kernel and their estimates /lopt obtained through the regression plug- 
in procedure. The next four columns show the normalized empirical MISE 
7^ Io(^h{t) - Xit))^dt for h = hopt, hopt, /iopt/2 and 2/iopt respectively. It 
is noticeable that (i) the regression plug-in method for approximating hopt 
works well; in particular, the empirical MISE with the estimated /lopt is close 
to that of hopt; (ii) the performance of the kernel method largely depends 
on the choice of the bandwidth; if one uses a nonoptimal bandwidth, such 
as twice or half /lopt; the error can increase as much as 30%; (iii) the choice 
of the kernel is not as crucial as that of the bandwidth, which echoes the 
classical result in kernel density estimation; and (iv) the widely used binning 
method, which is equivalent to using the uniform kernel, gives the largest 
error. 

In the second example, the arrival rate \{t) follows a log Gaussian pro- 
cess: X{t) = M exp{W{t)), where M > and W{t) is a stationary zero-mean 
Gaussian process with the autocovariance function 7(i). It is straightfor- 
ward to obtain /i = E{X{t)) = Me'^^o)/^ ^nd C{t) = M'^e<^^\e<^*'^ - 1). We 
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Epanechnikov Kernel Estimate 
Hidden Arrival Rate 



i 



1(1 




(a) Two State Markov Chain 



Epanechnil<ov Kernel Estimate 
Hidden Arrival Rate 



(b) Log Gaussian Process 



Fig. 1. Arrival rate estimation, (a) Two state Markov chain with fi = 828.57 and 
T = 500. (b) Log Gaussian X{t) with 7(f) = 1/(1 + \t\f and T = 1500. 



take 7(t) = l/(l + a|t|)-'^ so that C{t) decreases at the order . Both a and 
H are positive constants. The log Gaussian process has been used to model 
the conformational dynamics and reactivity of enzyme molecules [Min et 
al. (2005a); Kou and Xie (2004)]. For instance, Kou and Xie (2004) showed 
that an enzyme's conformational fluctuation can be modeled by a general- 
ized Langevin equation, in which the A(t) follows a log Gaussian process 
with the ACF having a power law decay C{t) ~ . Here, we take H = 6, 
a = 1, M = 1000 and T = 1500 in our simulation to mimic a real photon 
arrival data of this kind. Figure 1(b) compares the estimate ^{t) to the 
true A(t) for the Epanechnikov kernel. We repeat the simulation 100 times. 
Figure 1(b) represents a typical outcome. We see that X{t) is well recovered. 
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Table 1 

Kernel estimates of the arrival rate for the two-state continuous-time Markov chain 
model. The second and third columns present the optimal bandwidth hopt and the mean of 
their estimates hopt based on 100 simulations. The next several columns show the mean 
of the normalized empirical MISE vjr^ Jq {^h(t) — A(t))^ dt for h = hopt, hopt, /iopt/2 and 
2/iopt respectively. The numbers in the brackets are the associated standard deviations 

Bandwidth (in 10~^) Normalized empirical MISE (in 10~^) 



Kernel / 


hopt 


hopt 


Uniform 


4.93 


5.19 (0.34) 


Epanechnikov 


6.40 


6.67 (0.50) 


Quartic 


7.74 


8.07 (0.66) 


Triangular 


7.33 


7.64 (0.63) 



^opt ^opt ^opt 

/2 2/iopt 



2.43 (0.057) 2.43 (0.059) 3.05 (0.050) 2.93 (0.092) 

2.23 (0.053) 2.24 (0.055) 2.82 (0.048) 2.70 (0.085) 

2.20 (0.052) 2.21 (0.055) 2.78 (0.047) 2.65 (0.083) 

2.17 (0.052) 2.17 (0.054) 2.74 (0.047) 2.61 (0.082) 



Table 2 

Kernel estimates of the arrival rate for the log Gaussian model with C{t) = 1000^ 
X (exp(l/(l + \t\)^ + 1) — e). The second and third columns present hopt and the mean of 
the estimate hopt based on 100 simulations. The next four columns show the mean of the 

normalized empirical MISE for h = hopt, hopt, /iopt/2 and 2hopt respectively. The 
numbers in the brackets are the associated standard deviations 

Bandwidth (in 10~^) Normalized empirical MISE (in 10~^) 

Kernel / /lopt hopt hopt h 

opt ^opt 

/2 2 hopt 



Uniform 
Epanechnikov 
Quartic 
Triangular 



7.49 
9.73 

11.8 

11.1 



7.68 (0.27) 
10.1 (0.28) 
12.1 (0.32) 
11.4 (0.30) 



8.08 (0.16) 
7.45 (0.15) 
7.34 (0.15) 
7.23 (0.15) 



8.20 (0.19) 
7.45 (0.15) 
7.34 (0.15) 
7.23 (0.15) 



10.3 (0.17) 
9.33 (0.15) 
9.17 (0.15) 
9.12 (0.15) 



10.1 (0.32) 
9.26 (0.30) 
9.14 (0.29) 
8.97 (0.29) 



The detailed estimation results are summarized in Table 2. Again, we can 
see that the regression plug-in method for estimating /lopt works well and 
that the performance of the kernel method depends largely on the choice of 
the bandwidth and less so on the kernels. The widely used binning method 
again gives the poorest result. 

3. Estimating the ACF. 

3.1. Kernel estimation. In this section we consider kernel estimation of 
the ACF C{t) of the arrival rate. The ACF is useful in exploring the de- 
pendence structure of new stochastic dynamics and identifying appropriate 
parametric models for the data. 

For example, most ion channel dynamics and most chemical reactions 
involve reversible transitions among the various discrete chemical states in 
which the system can exist. In these systems, a fast decay of the ACF, such as 
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an exponential decay, indicates that the transition among the discrete states 
has a short memory, and the underlying biological process has a relatively 
simple mechanism, such as having only two or three states. In the case of 
ion channels, the simplest dynamic consists of a transition between a single 
shut state of the ion channel and a single open state [Sakmann and Neher 
(1995), Hawkes (2005)], in which the ACF is a single exponential function 
over time. In the case of a protein's conformational fluctuation, the simplest 
scenario is a transition between two distinct conformation states (where 
the protein reversibly and spontaneously crosses the energy barrier that 
separates the two states). In the case of enzyme catalytic fluctuations, the 
simplest scenario is that the enzyme interconverts among a small numbers 
of states, in which the ACF has a near exponential decay [Schenter, Lu and 
Xie (1999); Yang and Xie (2002a, 2002b); Kou, Xie and Liu (2005a); Kou 
et al. (2005b)]. 

A slow decay of ACF, such as a power-law decay, on the other hand, 
signifies a complicated process and points to an intricate internal structure, 
such as the existence of a large number of conformation states or the presence 
of a complicated energy landscape [Kou and Xie (2004); Min et al. (2005b)]. 

To ease the presentation, we first consider the situation where the mean 
arrival rate = E{X{0)) is known, and later relax the results for unknown fi. 
The basic idea is as follows. If we actually observe the realization of A(t), then 
using its ergodicity property, we have a natural estimate *(A(s) — 

fi){X{s + t) — fi)ds for C{t). Now A(s) is unobserved; we replace it by Xh{s). 
To avoid the bias at the boundary of the observation window, our kernel 
estimate of C{t) is 



The next two lemmas tell us the bias and variance of Cfj,^hit) for estimating 
C{t) at a fixed t. 

Lemma 3.1. Under Assumptions 1 and 2, 



(3.1) 




tG [0,r-26/i). 




+ / C{\t + {r-m)h\)f{r)f{m)drdm 



Furthermore, if Assumption 3 also holds, then 
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t > 2bh; 



te[0,2bh). 

The bias of the ACF estimate Cfj_^h{i) is due to the fact that \{t) is es- 
timated by "borrowing" information from the neighboring regions. When 
t < 2bh, the data points used to calculate Xh{s) and Xh{s + 1) overlap, re- 
sulting in the extra bias n J^^f{r + t/h)f{r) dr /h. 

For notational convenience, we denote 

C3{ti,t2) = E{{X{0) - ^)(A(ti) - fi){X{t2) - fi)}, 
v{ti,t2, s'-s) = cov{(A(s) - /i)(A(s + h) - /i), (A(s') - i2){X{s' + ta) - ^^)}■ 

Because of the stationarity of A(t), C3{ti,t2) = C3{t2,ti) and v{ti,t2,s' — 
s) = v{t2,ti,s — s'). The following two technical assumptions are needed to 
characterize the asymptotic behavior of var(C^j,,(t)). 

Assumption 4. The three-step correlation C3(ti,t2) is continuous and 
satisfies 

lim C3{ti,t2) = for any fixed ti. 

\t2\-^00 

Assumption 5. The cross correlation v{ti,t2,s) is continuous and sat- 
isfies 

lim v{ti,t2,s) = for any fixed ti and t2- 

\s\—^oo 

These two assumptions reflect the intuitive idea that as time-points move 
far away from each other, their dependence should eventually vanish. They 
are satisfied by most stationary and ergodic processes that one encounters in 
practice, such as continuous-time finite-state Markov Chains and functionals 
of stationary and ergodic Gaussian processes. 

Lemma 3.2. The variance of C^^hit) can be decomposed as 
(3.3) va.i{C^,h{t)) = vaT{E{C^,h{t)\Xm + i5;{var(C'^,,,(t)| A(-))}. 
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C{t) + C"{t)h^ r^f{r)dr + o{h^), 
J-b 

C(0) + ^ j'j(^r+^^f{r)dr 

/•b j-b 

+ C'(0+) / / \t+{r-m)h\f{r)f{m)drdm 



+ o{h), 



b 
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Under Assumptions 1, 2, 4 o-n-d 5, 
var{£;(C'^,,(t)|A(-))} 

(3.4) 



(T - 2bh 



where 



<T= I! 

J Jib 



[bh,T~bh--t]'2 




■b,b]* 



v{t + {1- ni)h, t + {l' - m')h, 

s' -s + {l' -l)h)f{l) 
X f{m)f{l')f{m') dl dm dl' dm' 



ds ds', 



B 



t,T 



[bh,T~bh~tp 




Csit + {I' - m')h, 



II III 

J J[bh,T-bh-t]^ U J\- 



s- s' + t+{l-m')h)f{l') 
X f{m')f{l)f(l + ^ dl dm' dl' ds ds', 

C{s' - s + {l' -l)h)f{l) 



and 

(3.5) E{vaT{C^,h{t)\Xm 



x/(/ + ;^)/(/')/(/' + ;^l dl dl' 



1 



ds ds'. 



T-2bh-t 



D^ + -E^ + —Fl' ]+0 



2^2 



where the three terms D^,E^ and Fj^ do not depend on T. Their exact hut 
lengthy expressions, involving multiple integrals, are given in the supplemen- 
tary material [Zhang and Kou (2010)]. 

Equation (3.3) indicates that the variance of the ACF estimate arises from 
two sources: the Poisson variation — E{\ax{C ^^h{i)\X{-))} — and the variation 
from \{t)-Yaj:{E{C^^h{t)\K-))}- At,T is the main part of var{£'(C'^_ft(t)|A(-))}. 
When t > 2bh, B^j. and C^j. both equal zero. 

Based on Lemmas 3.1 and 3.2, the next theorem tells us that as the 
observation time T gets larger, Cf^^h{t) consistently estimates C{t). 

Theorem 3.3. Suppose that C{t) is a continuous function oft G [0,oo) 
and that Assumptions 1, 2, 4 and 5 hold. Then for any fixed t>0, as T -h^ 
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oo and /i — )■ 0, 
so, in particular, 

C^,h{t) — ^ C{t) in probability. 

The assumption of continuous C{t) is satisfied for general continuous-time 
stationary and ergodic processes. We note that in the context of spatial point 
processes, different estimates of the covariance function have been proposed 
[see, e.g., Stoyan and Stoyan (1994) and Diggle (2003)]. We use (3.1) here 
mainly due to its internal coherency: an estimate of X(t) naturally leads to 
an estimate of C (t) . 

3.2. Practical consideration. To use the kernel estimate in practice, a 
few issues arise naturally. 

Unknown fi. In real applications, the mean arrival rate ^ is unknown. 
Employing its unbiased estimate fi = K/T, we use 

rT-bh-t 

Cf.,hit) = T_2bh-t J,i, + " '"^^^^^'^^ " ^) 

to estimate the ACF. A question follows immediately: is Cf^,h{t) still a con- 
sistent estimator? The next theorem provides a positive answer. 

Theorem 3.4. Suppose that C{t) is a continuous function of t £ [0,oo) 
and that Assumptions 1, 2, 4 md 5 hold. Then for any fixed t>0, as T -h^ 
oo and /i — )• 0, 

Cfi,h{t) — ^ C(t) in probability. 

Bias correction for small t. From Lemma 3.1, we see that C^^h{t) has 
an extra bias /i /(r + t/h)f{r) dr/h for t < 2bh. A bias correction can be 
conducted for t < 2hh, yielding 

(3.6) Cf,,n{t) = Cf,,n{t)-^ jj(^r+^^f{r)dr. 

Estimating /lopt- In Section 2 we briefly described how to estimate /lopt 
to recover the arrival rate, where the key is to estimate the derivative 
C'(0+). With Lemma 3.1 established, we now explain our estimate in detail. 
Lemma 3.1 tells us that, for small t, the expectation of Cjx^h{t) depends on 
j-h I-b 1^ + ('^ ~ ^)h\f{r)f{m) drdm linearly with C"(0''") as the slope. This 
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suggests that we can calculate Yi = Cf^^hiii) for evenly spaced U G [0,26/i), 
say, ten points, and regress Yi on Xi = /^^ l^i + — T^T')h\f{r)f{m) dr dm. 
The regression slope is our estimate of C(0"'"). Compared to the naive idea 
of using a numerical derivative (C'(A) — C'(0))/A for some small A to ap- 
proximate C"(0^), this regression estimate is not only easy to implement, 
but, more importantly, much stabler in performance (see Figure 2). 

For the calculation of C^,h(^i)) one needs to start from an initial h. We 
use h = p/il = pT/K, where the constant p (for example, between 3 and 
10) is the average number of data points falling in an interval of length h. 
This choice of initial h ensures that there are enough points in the kernel 
to give reliable Cfi^h{ti). Throughout our simulation and real data analysis, 
where fi is in the hundreds, we found that taking p between 3 and 10 gives 
almost identical results. Figure 2 shows how our estimate /lopt behaves for 
the two simulation examples of Section 2: the two-state Markov chain and 
the log Gaussian process model. The dotted vertical line is the true /lopt- 
The histograms in Figure 2 are based on 1000 i.i.d. replications of the Cox 
process. The estimate /lopt is seen to be stable and close to /lopt- 



The bandwidth h for estimating C{t). To estimate the ACF C{t), a nat- 
ural question is the choice of h. It could be different from that associated 
with recovering the arrival rate. One approach might be as follows: based 
on the results of Lemmas 3.1 and 3.2, find the asymptotic leading terms 
in bias-square and variance, and then search for h to optimize their lin- 
ear combination. Although conceptually "simple" , this approach in fact has 



Histogram of estimated optimal bandwidth 



Theoretically Optimal Bandwidth 
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Histogram of estimated optimal bandwidth 

• Theoretically Optimal Bandwidth — 



0.0085 0.0090 0.0095 0.01 00 0.01 05 0.01 1 
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(a) Two State Markov Chain (b) Log Gaussian Process with H — 6 

Fig. 2. Estimating the optimal bandwidth with the Epanechnikov kernel. 
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several major difficulties that make it ineffective for practical use: (a) As 
the equations in Lemmas 3.1 and 3.2 involve fourth moment and covariance 
of X{t), one needs to estimate them. Since the estimates of high moments 
often have large variability, the resulting h tends to be highly variable, (b) 
The bias and variance formulas depend on the specific value of t, which 
implies that for each t there is an h. Consequently, if the entire curve C{t) 
is of interest (as in many scientific studies), the computation becomes very 
intensive, (c) In order for the bias-squared to become comparable to the 
variance and in order for the asymptotics to take effect, h needs to be much 
smaller than {C{t) + jj?) / {jT v{t,t,s) ds), which in turn requires T to be 
quite large: r~0((C(t) + fj?)/{C"{t)'^h^)). However, real biophysical data 
with large /i and moderate T hardly satisfy this requirement, (d) In order to 
identify the asymptotic leading terms, more technical assumptions, such as 
short-range dependence of X{t) [i.e., C{t)dt < oo], have to be imposed, 
which restricts the estimate's general applicability. 

For these reasons, we recommend using hopt for t > 26/iopt and a smaller 
bandwidth h = min(p//i, /lopt), where p G [3, 10] for t < 26/iopt to estimate the 
ACF C{t). The reason to use min(p//i, /lopt) instead of hopt for t < 26/iopt is 
that for large mean arrival rate n, p/fl can be smaller than /lopt! in this case 
Lemma 3.1 tells us that for small t the bias of C^,/i(t) from h = p/fi tends to 
be smaller than that of /lopti while Lemma 3.2 indicates that the variances of 
the two are about the same. Thus, for small t, inm{p/fl, /lopt) appears to be a 
better choice. Although our bandwidth recommendation does not guarantee 
the smallest MSE for C{t) at every t, it does offer a stable and easy-to- 
compute bandwidth. We will demonstrate the effectiveness of this choice in 
Section 5 (see Table 3) when we study confidence interval construction. 

Approximating the variance of C^^hit)- For estimating the variance of 
C^,h(t) (e.g., in confidence interval construction), one can in principle use 
Lemma 3.2, replacing the unknown quantities with their empirical coun- 
terparts. However, this approach does not work well for the real data that 
we have tried for two reasons: (a) Multiple integrals on empirical third or 
higher moments tend to be highly variable, (b) The computing demands are 
quite high given the many multiple integrals involved. Fortunately, we find 
an efficient shortcut. First, when the mean arrival rate fi is large, variation 
from the underlying stochastic arrival rate dominates in the variance decom- 
position (3.3): Yar{E{C^^h{t)\\{-))} :$> E{vaiiC^^h{t)\X{-))}. Proposition 3.5 
below gives a theoretical justification. Second, for the real biophysical ex- 
perimental data that we have tried, var{£'(C'^ /i(t)| A(-))} accounts for more 
than 95% of the total variance var(C'^ /^(i)). Furthermore, in these real data, 
A^^r^/{T - 2bh - tf in the decomposition (3.4) of \ai{E{C^^h{t)\X[-))] pro- 
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vides more than 90% of var(C'^^/i(t)). These observations suggest that we 
can use A^j./{T — 2bh — t)^ to approximate var(C'^_/^(t)). 

Proposition 3.5. Denote Xo{t) = X{t)/n, that is, E{\o{t)) = 1 and 
X{t) = /iAo(t). Suppose the law of {Ao(i),i G R} is fixed. Then under As- 
sumptions 1-3, for any fixed T, h and t, 

(3-7) 2^'/[-t)2 /^^^(^/^'fc(^)) ^ ^' as II ^oo, 

where A^j, is defined in Lemma 3.2. 

This proposition directly relates to real experimental data, especially 
those from fluorescence biophysical experiments. In such experiments the 
samples are usually placed under a laser beam, and the photon arrival in- 
tensity is proportional to the laser strength. To illuminate the sample, ex- 
perimenters usually use a strong laser. In this scenario, since the intrinsic 
molecular dynamics do not change, the law of {Ao(t)} remains the same, 
while /Lt is large. 

The approximation of A^rp/{T — 2bh — t)^ can be further simplified for 
practical use. First, because the bandwidth h is usually chosen to be small, 
and v{-, •, •) is a continuous function, A^j. approximately equals J Jj^^ r_fe/i_j]2 
v{t, t, s' — s) ds ds'. Second, since the process {A(s), s € R} is stationary, and 
to accurately estimate C{t), t is usually small compared to T (in order to 
have enough data), / /^^^^ ^_j^_jj2 ^^(i, s' — s) ds' approximately equals 
{T-2bh-t)-^ j-T-t-2fe/i^^ _ ^ _ Third, replacing 

v{t,t,T) = E{{X{0) - /.)(A(t) - /.)(A(r) - ^)(A(r + t)- /.)) - C\t) 
with its empirical counterpart c6v(t,r), which is 

1 



c6v(t, r) = max 



T-2bh-r-t 

r-T-bh-r-t 

X / (\h{s)- fi)(\h{s + t)- fi)(\h[r + s)- fl) 

Jbh 

X {Xh{r + s + t)-fi)ds- Clf,{t),0^ , 



results in the final approximation of var(C'^ /j(t)): 



2 



T-t-2bh 



Note that v{t,t,r) is typically nonnegative, so we force c6v(t,r) to be non- 
negative also. We will demonstrate the use of V{t) in Section 5. 
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(a) MSE(h) I for uniform density (b) Bandwidth selected by subsampling 

Fig. 3. (a) MSE(/i) vs. bandwidth using Biggie's (1985) method for data from the two-s- 
tate model, (b) The bandwidth selected by the subsampling procedure. The histogram is 
based on 100 i.i.d. replications of the Cox process from the two-state model. 



Comparison with existing methods. Diggle (1985) provided a procedure 
for selecting the bandwidth for estimating the arrival rate A(t) in the case of 

/ being the uniform kernel. In this procedure, based on an estimate MSE{h) 
of E{Xii{t) — A(t))^, the bandwidth is chosen to be the one that gives the 
smallest MSEQi). Figure 3(a) shows the standardized estimate MSE{h) / fi^ 
for the data of Figure 1(a) by this approach. However, this method is compu- 
tationally more intensive than our method, since it involves estimating MSE 
for all the possible bandwidths. Moreover, the MSE estimator provided by 
Diggle (1985) is only meant for the uniform kernel and does not generalize 
to other kernels. 

Guan (2007) has proposed a composite likelihood cross-validation ap- 
proach in selecting bandwidth for estimating the ACF. However, due to the 
large data size in our study (more than two million arrival points), this 
method is computationally too expensive to use (we found that the C pro- 
gram cannot even finish in an affordable time). 

We also applied the subsampling procedure of Guan, Sherman and Calvin 
(2004, 2006) to our data. Figure 3(b) shows the histogram of the bandwidths 
selected by the subsampling procedure based on 100 i.i.d. simulations from 
the two-state model in Section 2. We found that this procedure leads to a 
much larger bandwidth than /lopt proposed in Section 2, and, consequently, 
the estimates of C{t) have large bias, particularly for t close to zero. 

Compared to the existing methods, in terms of computational effort, our 
proposed method takes no more than five minutes to finish analyzing a 
process with more than two million data points, including estimating the 
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arrival rate and the autocorrelation function and constructing the confidence 
intervals. 

Connection with the classical kernel density estimate. Despite its similar- 
ity with the classical kernel density estimate, the kernel estimates Xhit) and 
Cfi^h{t) have several distinct features: (i) Since X{t) is stochastic, a consistent 
estimate of X{t) does not exist, (ii) In classical kernel problems, the num- 
ber of observations K does not depend on the underlying density, whereas 
the total number of observations in our case is random and depends on the 
stochastic process {X{t),t G [0,r]}. Consequently, (iii) consistency refers to 
the observational window T — >■ oo. (iv) The asymptotic behavior of the ker- 
nel estimate would depend on the distributional properties of X{t), as we 
shall see next. 

4. Asymptotic distribution of the kernel estimate. We investigate the 
limiting behavior of the kernel estimate C^,/i(i) in this section, since the 
asymptotic normality plays an important role in confidence interval con- 
struction. For well-behaved X(t), we can show that the asymptotic normality 
of C^^hit) holds. 

p-mixing arrival rate. Let J-t = a{X{s) :s <t) be the sigma field gen- 
erated by X{s) for s < t, and Qt = ^iX{s) :s > t) be the tail sigma field 
generated by A(,s) for s>t. Define 

(4.1) pt = sup{E{Cr]):^eTs,E^ = 0,m\<l;rjGgs+t,Er] = 0,\\rj\\<l}. 
X{t) is said to be finite /9-mixing if psds < oo [Billingsley (1999)]. 

Theorem 4.1. Suppose that Assumptions 1, 2, 4 <ind 5 hold, and that 
the arrival rate process {X{t),t £ R} is bounded and finite p-mixing. Then 
for fixed t,h>0, 

(4.2) VT[Cf,,h{t)-E{C^,hm^N{0,a\t,h)) asT^oo, 
where a'^{t,h) = lim.T--:>.ooT vai{Cf^^h{t)) ■ 

Theorem 4.2. Suppose that Assumptions 1 and 2 hold and that the 
stochastic process X{s) is a continuous-time Markov chain with finite number 
of states. Then for fixed t,h>0, the asymptotic normality (4-2) holds with 
a'^{t,h) = limT-^ooTvar{C^^h{t)). 

Theorem 4.2 covers a large class of arrival rates. Another class of pro- 
cesses, widely used in the physical science literature, is functionals of sta- 
tionary Gaussian processes: A(s) = g{W{s)), where 5 is a positive and con- 
tinuous function, and {W{s),s G R} is a zero-mean Gaussian process. We 
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will see that as long as the autocorrelation of W{s) decays reasonably fast, 
the asymptotic normality of C^^h{t) remains true. 

We consider, in particular, Gaussian processes of the form 

(4.3) {W{t),te R : W{t) = W[je) for t G [je, {j + l)e),j = 0, ±1,±2, . . .}, 

where e > is a fixed constant. In other words, we consider Gaussian pro- 
cesses generated piecewise by a discrete skeleton: {. . . ,W{—2e),W{—e), 
W{0),W{e),W{2e), . . .}, where W{je) is a discrete-time stationary zero- 
mean Gaussian process. The reason that we focus on this type of Gaussian 
process is two fold: first, if e is small enough, W can essentially approxi- 
mate any continuous stationary Gaussian process with arbitrary precision, 
and this is typically how one simulates a Gaussian process; second, the the- 
oretical calculations behind continuous-time Gaussian processes, especially 
those regarding mixing conditions, are quite delicate [see, e.g., Ibragimov 
and Rozanov (1978)], so to avoid drifting too much into the mathematical 
details and to present our proofs in a concise manner, we work on (4.3). We 
have the following result on functionals of Gaussian processes; its proof is 
given in the supplementary material [Zhang and Kou (2010)]. 

Theorem 4.3. Suppose that Assumptions 1, 2, 4 o,nd 5 hold, and that 
X(s) = g{W{s)), where g is a positive and bounded measurable function, 
and {W{s),s G R}, defined in (4-3), is generated from a discrete skeleton 
{WUe)}. If the ACF j{je) = cov(W(0), ^(je)) satisfies E^lo hij^)\ < 
then for any fixed t and h, the asymptotic normality (4-2) of C^^h{t) holds 
with a'^{t,h) =\iTaT^ooT YSiT{C fj,^h{t)) ■ 

Long-range dependent processes. Stochastic processes with a finite in- 
tegrated correlation C(t)dt < oo are said to be short-range dependent. 
Our results essentially say that for Cox processes with short-range depen- 
dent arrival rates, we expect the asymptotic normality of Cfj,^h{t) to hold, 
which offers a big advantage in the confidence interval construction. For 
long-range dependent arrival rates {f^ C{t) dt = oo), however, no easy con- 
clusion can be drawn about the asymptotic behavior of C^^h{t)- Even the 
form of limiting law varies from case to case. For example, the limiting pro- 
cess might be a fractional Brownian Motion [Whitt (2002)], a stable Levy 
motion [Whitt (2002)] or a Rosenblatt process [Taqqu (1975)]. Moreover, the 
variance of the limiting law, most likely, will not be the same as the limit of 

the variance [Taqqu (1975)], that is, Huit^oo var(C'^^/i(t) / ^J vav{C^^h{t))) 7^ 

var(limT_!.oo C^^h{i)/ \l var(C'^^/i(i))). Therefore, an interesting open problem 
is to investigate the asymptotic behavior of the Cox process estimates with 
long-range dependent arrival rates. 
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5. Numerical study of ACF estimation. We illustrate our method through 
several numerical examples — both simulation and real biophysical experi- 
mental data. We use the Epanechnikov kernel throughout this section. 



5.1. Simulation examples. 



Finite-state Markov chains. Since Theorem 4.2 guarantees the asymp- 
totic normality of the kernel ACF estimate, we can construct a pointwise 
1 — a approximate confidence interval (C.I.) of C{t) via 



(5.1) 



where C(i^h{t) and V{t) are given in (3.6) and (3.8) respectively. Following 
the discussion in Section 3.2, we use h = min(5//i, /lopt) for t < 26/iopt and 

h = hopt for t>2bhopt- 

We revisit the two-state Markov chain model (2.6) in Section 2. In this 
case, the true ACF is exponential: C{t) = (Aa — As)^/i;iA:2 exp(— (fci + k2)t)/ 
{ki + ^2)^. We applied the kernel estimator and (5.1) to the data set simu- 
lated in Section 2 [Figure 1(a)]. Figure 4(a) shows C/x^h{t) as the solid line, 
and the point-wise 95% C.I. as the dotted lines. The true ACF C{t), shown 
as the dashed line, is well recovered. Since C{t) usually decays quite fast, 
to highlight the details, especially around the tails, we plotted the estimate 
on the logarithm scale. We see from Figure 4(a) that logC^_/j(t) is linear 




(a) (b) 

Fig. 4. ACF estimation for two-state Markov chains. The left panel shows Cfi^h{t) (the 
time t IS in second) and the approximate 95% C.I. [normalized by Cp,.h{0)] based on one 
sequence of arrival data. The right panel shows the 2.5 and 97.5 percentiles of Cfi^hit) 
calculated from 1000 i.i.d. repetitions from the same model. 
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Table 3 

The coverage probabilities of the 95% C.I. (5.1) at various time points for different 
models based on 1000 i.i.d. repetitions. For reference, the standard deviation of a 
binomial proportion with success probability of 0.95 and 1000 trails is 0.0069 
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with t, indicating the exponential decay of C{t). As a check for the accuracy 
of the C.I., we repeated the data generation 1000 times independently. For 
each simulated data set, we calculated C^,h(0- The 2.5 and 97.5 percentiles 
of these repeated estimates Cp,^h{t) give the real 95% coverage of Cf^^hit), 
which is shown on Figure 4(b). Comparing the two panels, we see that the 
variance estimate based on just one realization is close to the truth. From 
the 1000 i.i.d. repetitions, we calculated the coverage probabilities of the 
95% C.I. (5.1) for various t. Table 3 (the second row) reports the numbers, 
which are close to the nominal 95%; Figure 5(a) plots them graphically. 

Log Gaussian processes. We next consider examples where the arrival 
rate X{t) follows a log Gaussian process: A(i) = M exp{W (t)} , where W{t) 
is a stationary zero- mean Gaussian process with the ACF 'y{t). As we men- 
tioned in Section 2, the log Gaussian process has been used to model the 
conformational dynamics and reactivity of enzyme molecules [Min et al. 




(a) Two state Markov chain (b) Log-Gaussian process with H = 6 

Fig. 5. The coverage probabilities of the 95% C.L (5.1) for t £ [0,10] (the time t is in 
second) under different models. 
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CI based on one simulation Percentiles from 1000 i.i.d. simulations 
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(a) (b) 

Fig. 6. ACF estimation for a short-range dependent log-Gaussian process with 
C{t) — 10® (exp(l/(l + + — e) (the time t is in second). The left panel shows Cfi^h{t) 
and the approximate 95% C.I. [normalized by Cp,.h{0)] based on one sequence of arrival 
data. The right panel shows the 2.5 and 97.5 percentiles of Cp^^hit) calculated from 1000 
i.i.d. replications from the same model. The total observational time T equals 1500. Both 
graphs are plotted on the log-log scale. 



(2005a); Kou and Xie (2004)]. As in Section 2, we take 7(t) = 1/(1 + 01*1)-^ 
so that C{t) = M^(exp(7(t) + 1) — e) decreases at the order of . Both a 
and H are positive constants. The larger the decay slope H , the faster the 
C{t) converges to zero and the faster the estimate converges to C{t). H also 
determines the dependence structure of the Cox process: if < 1, the pro- 
cess is long-range dependent. In the simulation, we generate W{t) through 
the discrete skeleton (4.3), and then draw the arrival times si, S2, ... on top 
of \{t). For each simulated arrival sequence, we calculate the kernel estimate 
Cf,,h{t) and the 95% C.I. (5.1). 

We consider two log Gaussian processes: one with H = 6 and a = 1, and 
the other with H = 0.5 and a = 20. In both cases, the maximum obser- 
vational time T = 1500 and the constant M is taken to be 1000 to mimic 
typical photon arrival data from a biophysical experiment. For the log Gaus- 
sian process with H = 6, Figure 6(a) plots Cjx,h{t) and the 95% approximate 
C.I. based on one data set. Figure 6(b) plots the 2.5 and 97.5 percentiles of 
Cfi^hit) from 1000 i.i.d. repetitions. For easy visual detection of the power 
law decay, the graph is plotted on a log-log scale. The similarity between 
the left and right panels indicates the effectiveness of our method. The real 
coverage probabilities of the 95% C.I. (5.1) are shown in Figure 5(b) and 
Table 3 (the third row). We see that the real coverage probabilities for the 
H = 6 case are close to the nominal 95%. Figure 6 also suggests that logC(i) 
is roughly linear with logt near the tail of the curve. 
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(a) (b) 

Fig. 7. ACF estimation for a long-range dependent log-Gaussian process with 
C{t) — 10''(exp(l/(l + 2Q\t\Y''^ + 1) — e) (the time t is m second). The left panel shows 
Cfi.hit) and the approximate 95% C.I. [normalized by Cp,.h{0)J based on one sequence of 
arrival data. The right panel shows the 2.5 and 97.5 percentiles of Cji.hit) calculated from 
1000 i.i.d. replications from the same model. The total observational time T equals 1500. 
Both graphs are plotted on the log-log scale. 



The log Gaussian process with H = 0.5 and a = 20 is long-range depen- 
dent. We can still calculate Cji^h{t) and the C.I. (5.1). However, since even 
asymptotic normality is no longer valid, one would expect the real cover- 
age to be way off. Figure 7 contrasts the "C.I." based on one data set with 
the true 2.5 and 97.5 percentiles of from 1000 i.i.d. replications. It 

is evident that although C^,/i(t) still estimates C{t) reasonably well, the 
"C.I." constructed from one data set is quite narrower than the true per- 
centiles. The last row of Table 3 shows that the real coverage probabilities of 
(5.1) in this long-range dependent case are much smaller than the nominal 
95% — clearly the asymptotic variance is underestimated. 

Static processes. Our method can be easily applied to detect static pro- 
cesses. When the underlying biological process is static, the photon arrival 
rate \{t) is a constant, and C(t) = for t > 0. In this case, we would observe 
that the arrival rate estimate \; (t) oscillates around a constant, and the 

"opt ^ ' ' 

ACF estimate Cji^h{t) clusters around zero. Figure 8 shows such an example 
with constant arrival rate \{t) = 500 and T = 500. 

5.2. Experimental photon arrival data. Studying the conformational dy- 
namics of proteins is of current biophysical interest. For example, scientists 
have become aware that an enzyme's conformational fluctuation can directly 
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Fig. 8. Analyzing Cox process data with constant arrival rate X{t) = 500 and T = 500. 
(a) Arrival rate estimate, (b) ACF estimate (normalized by jl^ ). 



affect its catalytic activity — certain conformations yield highly active cataly- 
sis, whereas others lead to less active catalysis [Lu, Xun and Xie (1998); En- 
glish et al. (2006)]. A recent single-molecule experiment [Yang et al. (2003)] 
investigates the conformational dynamics of a protein-enzyme compound 
Fre, which is involved in the DNA synthesis of E. Coli. In the experiment, 
the protein compound is immobilized and placed under a laser beam. Pho- 
tons from the laser-excited molecule are collected. Since the photon arrival 
rate depends on the molecule's time- varying three-dimensional conformation 
(different conformations of Fre generate different arrival rates), the sponta- 
neous conformation fluctuation of Fre leads to a stochastic arrival rate. The 
ACF of the photon arrival rates therefore reflects the time dependence of 
Fre's conformational fluctuation [Weiss (2000)]. 

The experimental photon arrival data has a total observational time T = 
354 seconds. The empirical mean arrival rate ft = 534.6 counts /second. We 
first estimated the arrival rate and showed it in Figure 9(a). This plot leads 
to a natural question regarding the nature of Fre's conformational fluctua- 
tion: does Fre have a small number of distinct conformation states or many? 
Looking at the decay of C(t) provides one way to address this question. 
We applied the bias-corrected C^,/i(i) to estimate C{t). Figure 9(b) shows 
Cfi^^hit) and its approximate 95% C.I. (5.1). We plotted the estimates on a 
log-log scale to give a better view of the decay of the ACF. The apparent 
linear pattern suggests a power-law relationship. If there are only two, three 
or even four conformation states, then C{t) should be a mixture of no more 
than three exponential functions. The apparent power-law relationship in- 
dicates a different picture: instead of having two, three or even four discrete 
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(a) (b) 

Fig. 9. Analyzing the photon arrival data from a single-molecule experiment with 
T = 354.3 and /i = 534.6. (a) Arrival rate estimate, (b) ACF estimate and its 95% confi- 
dence interval plotted on the log-log scale. 



conformation states, the 3D conformation of Fre appears to fluctuate over 
a continuum (as a check, we have attempted to parametrically fit a mixture 
of three exponentials to the estimated ACF, but even the best fitting is very 
poor), so the parametric finite state Markov Chain model cannot be applied 
here. The slow decay of the ACF thus points to a complicated conformation 
dynamic of Fre, which implies that the enzyme's catalytic rate could vary 
over a broad range, a phenomenon called dynamic disorder in the biophysics 
literature [Min et al. (2005a); Lerch, Rigler and Mikhailov (2005)]. 

Another recent single-molecule experiment [Min et al. (2005b)] also in- 
vestigates protein's conformational dynamics, studying a protein complex 
formed by fluorescein and monoclonal antifluorescein. This protein complex 
is an antibody-antigen system. Like the previous compound, the 3D con- 
formation of the molecule spontaneously fluctuates over time. To study the 
conformational dynamics, the immobilized protein complex was placed un- 
der a laser beam. Photons from the laser-excited molecule are collected. The 
photon arrival rate X{t) depends on the molecule's time-varying conforma- 
tion. Figure 10(a) shows the arrival rate estimates for this data, which have 
T = 1312.8 and fi = 1523.5. This plot seems to suggest that there are many 
conformation states in this antibody-antigen system. To further investigate, 
we applied Cp,^h{t) to estimate C{t). Figure 10(b) shows our estimate and 
the approximate 95% C.I. (5.1) on a log-log scale. Again, we observed a 
slow decay of the ACF. We attempted to parametrically fit a mixture of 
three exponentials to the estimated ACF but only obtained a very poor re- 
sult. Like the previous system, it appears that the 3D conformation of this 
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(a) (b) 

Fig. 10. Analyzing the photon arrival data from another single-molecule experiment with 
T = 1312.8 and fi = 1523.5. (a) Arrival rate estimate, (b) ACF estimate and its 95% 
confidence interval plotted on the log-log scale. 



antibody-antigen system fluctuates over a broad range rather than over just 
a few, say, three of four, discrete states. 

Since the second experiment is on a totahy different system from the first, 
our statistical results indicate that (i) conformational fluctuation could be 
widely present in protein systems; (ii) the fluctuation appears to be over a 
broad range of time scales. Our results thus support the growing understand- 
ing in the biophysics community that proteins' conformational fluctuation is 
a complex phenomenon, which in turn affects some crucial functions of pro- 
teins, such as enzyme catalysis [Lerch, Rigler and Mikhailov (2005); English 
et al. (2006)] and electron transfer in photosynthesis [Wang et al. (2007)]. 

6. Conclusion. Motivated by the analysis of experimental data from bio- 
physics, we propose a nonpar ametric kernel based method for inferring Cox 
process in this article, complementing existing parametric approaches. An 
important feature of the arrival data in biophysics is that the arrival rate is 
often large, which makes the methods developed for analyzing spatial point 
processes (which usually have low arrival rates), such as variance estimate, 
bandwidth selection and asymptotic theory, not directly applicable for our 
purpose. In addition to proposing the kernel estimates, we conduct a de- 
tailed study of their properties. We show that the asymptotic normality of 
our ACF estimates holds for most short-range dependent processes, which 
provides the theoretical underpinning for confidence interval construction. 
We provide an approximation of the variance of the ACF estimate, which 
accounts for at least 90% of the total variation in our examples. We can 
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possibly improve this approximation, for example, by taking into account 
the Poisson variation £J{var(C'^^ft,(t)| A(-))} part, which might be particularly 
beneficial when A(t) has a strong short-range dependence. 

We applied our nonparametric method to analyze two real photon arrival 
data produced in recent (single-molecule) biophysical experiments. Using 
our kernel ACF estimate, we examine the conformational dynamics of two 
different protein systems. We observed that the conformational fluctuation 
exhibits a long memory and spans a broad range of time scales, confirming 
the recent experimental discovery that the classical static picture of proteins 
that researchers used to assume needs to be revised. 

An important open question for future study is to investigate Cox pro- 
cesses with long-range dependent arrival rates. Another open question for 
our future investigation is the estimation of high-order correlations of the 
arrival rate, as biophysicists and chemists have used them to discriminate 
different mechanistic and phenomenological models [Mukamel (1995)]. 
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SUPPLEMENTARY MATERIAL 

Technical proofs (DOL 10.1214/10-AOAS352SUPP; .pdf). Technical proofs 
accompanying the paper "Nonparametric inference of doubly stochastic Pois- 
son process data via the kernel method" by Zhang and Kou. 
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